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The University of Houston- Clea r Lake established the Research Institute for 
Computing and Information Systems (RICIS) in 1986 to encourage the NASA 
Johnson Space Center (JSC) and local industry to actively support research 
in the computing and information sciences. As part of this endeavor, UHCL 
proposed a partnership with JSC to jointly define and manage an integrated 
program of research in advanced^ data processing technology needed for JSC’s 
main missions, including administraUve, engineering and science responsi- 
bilities. JSC agreed and entered into a continuing cooperative agreement 
with UHCL beginning in May 1986, to Jointly pla n and exec ute such research 
through RICIS. Additionally, under Cooperative Agreement NCC 9-16, 
computing and educationalTacilities are shared by the two institutions to 
conduct the research. 

The UHCL/RICtS mission is to conduct, coordinate, and disseminate research 
and professional level education in computing and information systems to 
serve the needs of the government, industry, community and academia. 
RICIS combines resources of UHCL and its gateway affiliates to research and 
develop materials, prototypes and publications on topics of mutual interest 
to its sponsors and researchers. Within UHCL, the mission is being 
implemented through interdisciplinary involvement of faculty and students 
from each of the four schools: Business and Public Administration, Educa- 
tion, Human Sciences and Humanities, and Natural and Applied Sciences. 
RICIS also collaborates with industry in a companion program. This program 
is focused on serving the research an d advanced development needs of 
industry. 

Moreover, UHCL established relationships with other universities and re- 
search organizations, having common research Interests, to provide addi- 
tional sources of expertise to conduct needed research. For example, UHCL 
has entered into a special partnership with Texas A&M University to help 
oversee RICIS research ani education programs, while other research 
organizations are involved via the “gateway" concept, 

A major role of RICIS then is to find the best match of sponsors, researchers 
and research objectives to advance knowledge in the computing and informa- 
tion sciences. RICIS, working jointly with its sponsors, advises on research 
needs, recommends principals for conducting the research, provides tech- 
nical and administraUve support to coordinate the research and integrates 
technical results into the goals of UHCL, NASA/JSC and industry. 
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Computing and Information Systems by James M. Keller of the University of 
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Technology Branch, Information Technology Division, Information Systems 
Directorate, NASA/JSC. 
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Introduction 


For the fourth rind final quarter of this research contract, we are going to report 
progress on the following four Tasks (as described in the contract): 

1. Fuzzy Set Based Decision Methodologies 

2. Membership Calculation; 

3 . Clustering Methods (including derivation of pose estimation parameters);, 

4. Acquisition of images and testing of algorithms. 

The report, as has done in the past, consists of "stand alone" sections describing the 
activities in each task. It does not duplicate the material contained in the previous quarterly 
reports. For details of the earlier work done under this contract, please refer to the first 
three quarterly reports. 


Fuzzy Set Based Decision Methodologies 


In this section, we report on two new fuzzy set based techniques that we developed 
for decision making. These include: 

1. A method to generate fuzzy decision rules automatically for image analysis. 

2. A decision making algorithm based on possibility expectation. 

The following pages contain the details of these two pieces of work. 



A Method to Generate Decision Rules Automatically for Image Analysis 

In this report, we propose a method to generate rules automatically for image analysis such 
as segmentation. The method used for segmentation is best described by the following paper 
submitted to the North American Fuzzy Information Proceeding Society (NAFIPS ’92). For this 
report, slight modifications are made where only the experimental example differs from the original 
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Automatic Rule Generation for High-Level Vision 

Frank Chung-Floon Rhee and Raghu Krishnapuram 
Department of Electrical and Computer Engineering 
University of Missouri, Columbia, MO 6521 1 

ABSTRACT 

Many high-level vision systems use rule-based approaches to solve problems such as 
autonomous navigation and image understanding. The rules are usually elaborated by experts. 
However, this procedure may be rather tedious. In this paper, we propose a method to generate 
such rules automatically from training data. The proposed method is also capable of Filtering out 
irrelevant features and criteria from the rules. 

1. Introduction 

High-level computer vision involves complex tasks such as image understanding and scene 
interpretation. In domains where the models of the objects in the image can be precisely defined, 
(such as the blocks world, or even the world of generalized cylinders) existing techniques for 
description and interpretation perform quite well. However, when this is not the case (such as the 
case of outdoor scenes or extra-terrestrial environments), traditional techniques do not work well. 
For this reason, we believe that the greatest contribution of fuzzy set theory to computer vision will 
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be in the area of high-level vision. Unfortunately, very little work has been done in this highly 
promising area. Fuzzy set theoretic approaches to high-level vision have the following advantages 
over traditional techniques: i) they can easily deal with imprecise and vague properties, 
descriptions, and rules, ii) they degrade more gracefully when the input information is incomplete, 
iii) a given task can be achieved with a more compact set of rules, iv) the inferencing and the 
uncertainty (belief) maintenance can both be done in one consistent framework, v) they are 
sufficiendy flexible to accommodate several types of rules other that just EF-THEN rules. Some 
examples of the types of rules that can be represented in a fuzzy framework are [1] possibility rules 
("The more X is A, the more possible that B is the range for Y"), certainty rules ("The more X is 
A, the more certain Y lies in £"), gradual rules ("The more X is A, the more T is B"), unless rules 
[2] ("if X is A, then Y is B unless Z is C"). 

The determination of properties and attributes of image regions and spatial relationships 
among regions is critical for higher level vision processes involved in tasks such as autonomous 
navigation, medical image analysis and scene interpretation. Many high-level systems have been 
designed using a rule-based approach [3,4]. In these systems, common-sense knowledge about the 
world is represented in terms of rules, and the rule are then used in an inference mechanism to 
arrive at a meaningful interpretation of the contents of the image. In a rule-based system to interpret 
outdoor scenes, typical rules may be 

IF a REGION is RATHER THIN AND SOMEWHAT STRAIGHT 

THEN it is a ROAD 

IF a REGION is RATHER GREEN AND HIGHLY TEXTURED AND 
IF the REGION is BELOW a SKY REGION 

THEN it is TREES 

Attributes such as "THIN" and "NARROW", and properties such as "BRIGHT" and 
"TEXTURED" defy precise definitions, and they are best modeled by fuzzy sets. Similarly, spatial 
relationships such as "LEFT OF ", "ABOVE" and "BELOW" are difficult to model using the all- 
or-nothing traditional techniques [5]. We may interpret the attributes, properties and relationships 
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as "criteria". Therefore, we believe that a fuzzy approach to high-level vision will yield more 
realistic results. 

In most rule-based systems, the rules are usually enumerated by experts, although they 
may also be generated by a learning process. Several techniques have been suggested in the 
literature to generate rules for control problems [6-9], some of which use neural net methods to 
model the control system [7-12]. These systems convert a given set of inputs to an output by 
fuzzifying the inputs, performing fuzzy logic, and then finally defuzzifying the result of the 
inference to generate a crisp output [13]. Some of the methods also "tune" the membership 
functions that define the levels (such as "LOW", "MEDIUM" and "HIGH") of the input variables 
[10]. While these methods have been shown to be very effective in solving control problems, they 
cannot be directly used in high-level vision applications. For example, in control systems, the 
fuzzy rules have consequents which are usually a desired level of a control signal whereas in high- 
level vision, the consequent clauses are usually fuzzy labels . Also, it is desirable that membership 
functions for levels of fuzzy attributes such as "THIN", and "NARROW", and properties such as 
"BRIGHT" be related to how humans perceive such attributes or properties. Hence they have very 
little to do with the decision making or reasoning process in which they are employed. In many 
reasoning systems for high-level vision, confidence (or importance) factors are associated with 
every rule since the confidence in the labeling may depend on the confidence of the rule itself. In 
this paper, we propose a new method to generate rules for high-level vision applications 
automatically. The rules so obtained may be combined with the rules given by the experts to 
complete the rule base. 

In Section 2, we describe several fuzzy aggregation operators which can be used in 
hierarchical (multi-layer) aggregation networks for multi-criteria decision making. In Section 3, we 
describe how these aggregation networks can be used to filter out irrelevant attributes, properties, 
and relationships and at the same time generate a compact set of fuzzy rules (with associated 
confidence factors) that describes the decision making process. In Section 4 we present some 
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experimental results on automatic rule generation. Finally Section 5 contains the summary and 
conclusions. 


2. Fuzzy Aggregation Operators 

Fuzzy set theory provides a host of very attractive aggregation connectives for integrating 
membership values representing uncertain and subjective information [14]. These connectives can 
be categorized into the following three classes based on their aggregation behavior i) union 
connectives, ii) intersection connectives, and iii) compensative connectives. Union connectives 
produce a high output whenever any one of the input values representing different features or 
criteria is high. Intersection connectives produce a high output only when all of the inputs have 
high values. Compensative connectives are used when one might be willing to sacrifice a little on 
one factor, provided the loss is compensated by gain in another factor. Compensative connectives 
can be further classified into mean operators and hybrid operators. Mean operators are monotonic 
operators that satisfy the condition: min(a,b) < mean(a,d) < ma \(a,b). The generalized mean 
operator [15] as given below is one of such operator. 
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The h j’s can be thought of as the relative importance factors for the different criteria. The 
generalized mean has several attractive properties. For example, the mean value always increases 
with an increase in p [15]. Thus, by varying the value of p between — 00 and -H», we can obtain all 
values between min and max. Therefore, in the extreme cases, this operator can be used as union 
or intersection. The y model devised by Zimmermann and Zysno [16] is an example of hybrid 

operators, and it is defined by 
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In general, hybrid operators are defined as the weighted arithmetic or geometric mean of a pair of 
fuzzy union and intersection operators as follows. 
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(3) 

(4) 


A © y B = (1- y) (A n B) + y(A u B) 

A ® y B = (A n B)(l* >)(A U B)7 
The parameter yin (3) and (4) controls the degree of compensation. The y-model in (2) is a hybrid 
operator of the type in (4). The compensative connectives are very powerful and flexible in that by 
choosing correct parameters, one can not only control the nature (e. g. conjunctive, disjunctive and 
compensative), but also the attitude (e. g. pessimistic and optimistic) of the aggregation. 

One can formulate the problem of multicriteria decision making as follows. The support for 
a decision may depend on supports for (or degrees of satisfaction of) several different criteria, and 
the degree of satisfaction of each criterion may in turn depend on degrees of satisfaction of other 
sub-criteria, and so on. Thus, the decision process can be viewed as a hierarchical network, where 
each node in the network "aggregates" the degree of satisfaction of a particular criterion from the 
observed support. The inputs to each node are the degrees of satisfaction of each of the sub- 
criteria, and the output is the aggregated degree of satisfaction of the criterion. Thus, the decision 
making problem reduces to i) selecting robust and useful criteria for the problem on hand, ii) 
finding ways to generate memberships (degrees of satisfaction of criteria) based on values of 
features (criteria) selected, and iii) determining the structure of the network and the nature of the 
connectives at each node of the network. This includes discarding irrelevant criteria to make the 
network simple and robust. 

In our previous research, we have investigated the properties of several union and 
intersection operators, the generalized mean, and the y-model [14,17]. We have shown that 
optimization procedures based on gradient descent and random search can be used to determine the 
proper type of aggregation connective and parameters at each node, given only an approximate 
structure of the network and given a set of training data that represent the inputs at the bottom-most 
level and the desired outputs at the top-most level [14,17]. In this paper, we extend this idea to the 
detection of irrelevant attributes and automatic rule generation. 
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3. Redundancy Analysis and Rule Generation 

In the approach we propose, we first fuzzily partition the range of values that each criterion 
(property or an attribute or a relation) can take into several linguistic intervals such as LOW, 
MEDIUM and HIGH. The set of properties or .an attributes or a relations which are used are the 
ones that may appear in the antecedent clause of a rule. As explained in Section 1, the membership 
function for each level needs to be determined according to how humans perceive such attributes, 
properties or relations. The membership values for an observed attribute, property or relationship 
value in each of the levels is calculated using such membership functions. (Methods to generate 
degrees of satisfaction of relationships such as "LEFT OF" may be found in [18]). The 
memberships are then aggregated in a fuzzy aggregation network of the type shown in Figure 1. 
The top nodes of the network represent the labels that may appear in the consequents of the rules. 
A suitable structure for the network, and suitable fuzzy aggregation operators for each node are 
chosen. The network is then trained with typical attribute, property or relationship data with the 
corresponding desired output values for the various labels to leam the aggregation connectives and 
connections that would best describe in input-output relationships. The learning may be 
implemented using a gradient descent approach similar to the backpropagation algorithm [14,17]. It 
is to be noted that there is a constraint on the weights. 



L SL M SH H L SL M SH H 

Feature 1 Feature N 

Figure 1 : Network for generating fuzzy rules. 
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Our experiments indicate that the choice of the network is not very critical. Also any 
compensative aggregation operator seems to yield good results. In all the results shown in this 
paper, we used the generalized mean operator as the aggregation operator. As indicated in Section 
2, the generalized mean can closely approximate a union (intersection) operator for a large positive 
(negative) value of p. We start the training with the generalized mean aggregation function with 
p=l. If the training data is better described by a union (intersection) operator, then the value of p 
will keep increasing (decreasing) as the training proceeds, until the training is terminated when the 
error becomes acceptable. Also, the weights w/ in (1) may be interpreted as the relative importance 
factors for the different criteria. Initially we start the training with all the weights associated with a 
node being equal. As the training proceeds the weights automatically adjust so that the overall error 
decreases. Some of the weights eventually become very small. Thus, the training procedure has the 
ability to detect certain types of redundancies in the network. In general, there are three types of 
redundancies (irrelevant criteria) that are encountered in decision making [17]. These correspond to 
uninformative, unreliable and superfluous criteria. 

Uninformative Criteria: These are criteria whose degrees of satisfaction are always approximately 
the same, regardless of the situation. Therefore, these criteria do not provide any information about 
the situation, thus contributing little to the decision-making process. For example, low texture 
content is a criterion that is always satisfied for both clear skies and roads, and hence it would be a 
uninformative criterion if one needs to distinguish between these two labels. Uninformative criteria 
do not contribute to the robustness of the decision making process, and therefore it is desirable that 
they be eliminated. 

Unreliable Criteria: These correspond to criteria whose degrees of satisfaction do not affect the 
final decision. In other words, the final decision is the same for a wide range of degrees of 
satisfaction. For example, color would be an unreliable criterion for distinguishing a rose from a 
hibiscus because they both come in similar colors. Unreliable criteria do not contribute to the 
robustness of the decision making process, and therefore it is desirable that they be eliminated. 
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Superfluous Criteria: These are criteria which are strictly speaking not required to make the 
decision. The decisions made without considering such criteria may be as accurate or as reliable. 
For example, one may want to differentiate planar surfaces from spherical surfaces using Gaussian 
and mean curvatures, but the criteria are superfluous because either one of them is sufficient to 
distinguish between planar and spherical surfaces. However, redundancies of this type are not 
entirely without utility, since such redundancies make the decision making process more robust. If 
one criterion fails for some reason, we may still be able to arrive at the correct decision using the 
other. Hence such redundancies may be desirable to increase the robustness of the decision-making 
process. 

Redundancy Detection and Estimation of Confidence Factors: A connection is considered 
redundant if the weight associated with it gradually approaches to zero (or a small threshold value) 
as the learning proceeds. A node (associated with a criterion) is considered redundant if all the 
connections from the output of this node to other nodes become redundant. Our simulations show 
that both in the case of uninformative criteria and unreliable criteria, the weights corresponding to 
all the output connections go to zero. Therefore such nodes (criteria) are eliminated from the 
structure. The examples in Section 4 illustrate this idea. 

Rule Generation: The networks that finally result from this training process can be said to represent 
rules that may be used to make the decisions. If the final value of the parameter p at a given node is 
greater than one, the nature of the connective is disjunctive. If the value is less than one, it is 
conjunctive. Once the nature of the connective at each node is determined, we can easily construct 
the fuzzy rules that describe the input-output relations. In Section 4 we present some examples of 
this approach. 

4. Experimental results 

In this section, we present some typical experimental results involving real data to show the 
effectiveness of the proposed automatic rule generation method. The method is shown to generate 
decision rules that best describe the decision criteria for the classes in the experiment. Figure 1 
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shows the general 3 layer neural network used to generate the rules. The input layer consists of nN 
number of input nodes where N is the number of fuzzy features or criteria (such as properties and 
relationships) and n is the number of linguistic levels used to partition each feature. For the hidden 
layer, there are nN hidden nodes where each node is connected to all but one (i.e., it is connected 
to n-1) input nodes representing levels within each feature. The top layer fully connects the hidden 
layer. In the experimental results shown here, we used 5 fuzzy linguistic levels to represent each 
feature, therefore, each hidden node has 4 connections. Other types of network structures were 
also tried, however the one described above produced the best results. The target values in the 
training data were chosen to be 1.0 for the class from which the training data was extracted, and 
0.0 for remaining classes. The feature values were always normalized so that they fall in the range 
[0,1]. Figure 2 depicts the trapezoidal fuzzy sets used to model the intuitive notions of the Five 
linguistic levels LOW (L), SOMEWHAT LOW (SL), MEDIUM (M), SOMEWHAT 
HIGH (SH), and HIGH. 


L SL M SH H 



Figure 2 : Graphical representations of various fuzzy sets. 
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4.1 Example 

Figure 3(a) shows a 200x200 image used for training in order to obtain rules that best 
describes the object (shuttle) and background. After examining a variety of possible features to be 
used, the two best features chosen were the difference entropy and contrast features. For 
definitions of the features, see report on membership generation methods. Figures 3(b) and 3(c) 
show images using these features. Figure 3(d) shows the scatter plot of the training samples 
extracted from two different regions (shuttle and background) in the image. We used 50 samples 
from each class. The membership values in each linguistic level for each sample is computed using 
the membership functions shown in Figure 2, and these with the corresponding desired targets are 
used as training data in the training algorithm described in Section 3. Figure 4 shows the reduced 
network after training. All the connections with weights below a value of 0.01 were considered 
redundant. Table 1 shows the final weights (which determine the confidence factors of the rules 
and criteria) and the p parameter values (which determine the conjunctive or disjunctive nature of 
the connective) for the specified nodes in Figure 4. Using the properties for the p values obtained, 
the following rules are generated, as discussed in Section 3. 

Class Shuttle = (Difference Entropy MvDifference Entropy SHvDifference Entropy H) v 

(Contrast SL). (5) 

In other words, the rule may be summarized as 

Rshuttle ; IF Difference Entropy is M or SH or H or Contrast is SL 
THEN the class is Shuttle. 

Similarly, 

Class Background = (Difference Entropy SLvDifference Entropy SH) a 

(Contrast L) (6) 

and 

R Background : IF Difference Entropy is SL or SH and Contrast is L 
THEN the class is Background. 

1 
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These rules makes sense since by expanding (5) and (6), the expansions results in the appropriate 
cell locations where the training samples are located in Figure 3(d). 




Figure 3(a) : image for training, (b) : difference entropy image, and (c) : contrast image. 
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Contrast 



Figure 3(d) : Scatter plot of training samples for the classes shuttle and background. 
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Table 1 : Values of weights and parameter/? for the reduced network. 


! 

weights 

p 

! node 1 

0.70 " 

5.48 


0.15 



0.15 


node 2 

0.94 

-0.21 


0.06 


node 3 

0.49 

7.04 


0.01 



0.50 


node 4 

0.94 

4.00 


0.06 


node 5 

1.0 

0.78 

node 6 

1.0 

1.88 

node 7 

1.0 

1.88 


4.2 Segmentation 

Figure 5(a) shows a 200x200 test image for segmentation using the reduced network after 
training shown in Figure 4. Figures 5(b) and 5(c) show images of the two features (difference 
entropy and contrast) that were chosen previously. After employing the shrink and expand 
algorithm to remove noise points, the resulting segmented image is shown in Figure 5(d) . 

5. Summary and Conclusions 

In this paper, we introduced a new method for automatically generating rules for high level 
vision. The range of each feature is fuzzily partitioned into several linguistic intervals such as 
LOW, MEDIUM and HIGH. The membership function for each level is determined, and the 
membership values for an observed feature value in each of the linguistic levels is calculated using 
these membership functions. The memberships are then aggregated in a fuzzy aggregation 
network. The networks are trained with typical data to leam the aggregation connectives and 
connections that would give rise to the desired decisions. The learning process can also be made to 
discard redundant features. The networks that finally result from this training process can be said 
to represent rules that may be used to make the decisions. Riseman et al used similar rules for 
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segmentation and labeling of outdoor scenes, but the weights used in the aggregation scheme were 
determined empirically [191. The ability to generate rules that can be used in fuzzy logic and rule- 
based systems directly from training data is a novel aspect of our approach. One of the issues that 
requires investigation is the choice of the number of linguistic levels and its effect on the decision 
making process. 


w 
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(C) 


(d) 



Figure 5(a) : image for testing, (b) : difference entropy image, 
(c) : contrast image, and (d) : segmented image 
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Possibility Expectation and Its Decision Making Algorithm 


Abstract 


James M. Keller and Bolin Tan 

Electrical and Computer Engineering 
University of Missouri 
Columbia, MO 65211, USA 




The fuzzy Integral has been shown to be an effective tool for the aggregation of evidence in 
decision making. Of primary importance in the development of a fuzzy integral pattern 
recognition algorithm is the choice (construction) of the measure which embodies the 
importance of subsets of sources of evidence. Sugeno fuzzy measures have received the most 
attention due to the recursive nature of the fabrication of the measure on nested sequences of 
subsets. Possibility measures exhibit an even simpler generation capability, but usually 
require that one of the sources of information possess complete credibility. In real 
applications, such normalization may not be possible, or even desirable. In this report both 
the theory and a decision making algorithm for a variation of the fuzzy integral are presented. 
This integral is based on a possibility measure where it Is not required that the measure of the 
universe be unity. A training algorithm for the possibility densities in a pattern recognition 
application is also presented with the results demonstrated on the shuttle-earth-space training 
and testing images. 


1. Introduction 

Decision making is a basic problem in science, engineering, and even in daily life. There 
are often conflicting requirements of low error rates and minimum computation time to 
reduce the cost. The purpose of this paper is to propose the concept of possibility expectation 
via the possibility integral as a decision making scheme, which can be used to construct 
optimal decision making algorithms. A possibility expectation is a value of nonlinear 
integration of two pieces of information, namely, an evidence function h(x) and a possibility 
measure Pos(-). A possibility measure is a monotonic set function with the property that the 
measure of the universe X can be less than or equal to unity. 

An example of possibility expectation is the following: In the court room, although the 
witnesses for both the defendant and plaintiff promise that they will tell the truth, the judge 
still needs to assign the grade of credibility (possibility densities) to each person to evaluate 
what the person says (evidence). The Judge will integrate what each group of witnesses said with 
his belief in that group's credibility (possibility measure). Then the judge makes his decision 
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(possibility expectation). 

In multicriteria decision making, as can be found in most pattern recognition problems, 
the value of each source of information (and thus all subsets of sources) toward each 
alternative can be different. For example, "greenness" may be a very important feature for 
recognizing certain types of trees in an image; whereas it may be quite unimportant as a feature 
for a roof of a building. This difference in the importance or credibility of subsets of 
information sources will be encoded in a possibility measure. The degree to which a given 
image region is green, to continue the example, is objective evidence supplied by the 
information source. After collecting all such objective information, it is the job of the decision 
making algorithm to fuse the objective evidence together with the worth of the sources. In our 
methodology, this will be accomplished by utilizing the possibility integral, a variation of the 
fuzzy integral |lj. 

The particular possibility measures which we describe generalize fuzzy measures in that it 
is not required that the measure of the entire domain of discourse be one. In a pattern 
recognition problem, it may not be possible, or may not be desirable to force one of the sources 
of information to have "perfect credibility”. By relaxing this requirement, not only do we 
match real situations better, we also provide the opportunity to create better decision making 
algorithms, as we shall see later. 

For a pattern recognition environment, a method to learn the possibility densities (values 
upon which the measure is generated) from training data is given. The results of the 
subsequent algorithm are used to segment a shuttle from the earth and space backgroud. 

2. Possibility Measures and Possibility Integral 

Definition 2.1 A set function Posh): 2^ — »[0, 1) is called a possibility measure if it satisfies the 
following properties: 

(1) Pos{0) = 0. Pos(X) < 1. 

(2) IfA,Be2 x and Ac B. then Pos(A) < Pos(B), 

(3) Pos( UA, ) = sup [Pos( A)]. 

J- 1 J ,*U.Dl J 

Note: If X is finite, a possibility measure is not a fuzzy measure when Pos(X) < 1 ; it is the 
same as fuzzy measure only when Pos (X) =1. If X is infinite, a possibility measure is not a fuzzy 
measure in general [2]. Purl and Ralescu [3] give two counterexamples which show that, even in 
"nice” cases, a possibility measure is not a fuzzy measure in the infinite case. 
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Definition 2.2 Let X = { Xj I j = 1 n}bea finite set and let Pos be a possibility measure on 2 X . 

The set { pJ = Pos({xj )) I j = 1 n } is called the set of possibility densities for Pos. 

By definition of the possibility measure, it is clear that the measure of any subset A of X 
can be generated by 

Pos( A) = max { p) h 

^ e A 

and hence, a possibility measure is easily generated by its densities. 

We note that possibility theory can be induced not only from the nested bodies of evidence 
within the Dempster-Shafer theory [4], but also from the fuzzy sets introduced by Zadeh [61. A 
fuzzy set F is a set whose elements are characterized by the membership grade function 
Hp{x): X — >[0, 1]. A value of |ip{x) expresses the grade of membership that an element x 6 X 
belongs to the fuzzy subset F of X. Let 7 tp(x) = pp(x) be a possibility distribution induced by a 
fuzzy set F. In general, a possibility distribution is thought of as an elastic restriction on the 
values within a domain of discourse which a fuzzy variable may assume [51 . The fuzzy set F 
provides the meaning of the restriction. A possibility measure is defined as 
Pos(A) = sup[ xp(x) 1 for all A 6 2 X . This relationship holds also for non-normal fuzzy sets [61 . 

HE A 

Although a fuzzy set and a possibility distribution have a common mathematical expression, 
the underlying concepts are different [51. 

Our possibility measures are non-normalized generalizations of what are referred to as S- 
decomposable measures [7, 8J, these being a class of fuzzy measures which are easily 
computable. 

Definition 2.3 Let h(x) be a function such that h: X — »[0. 11. and let Pos( - ) be a possibility 
measure of 2 X . The possibility integral or the possibility expectation of h(x) with respect to 
Pos(-) is defined as 

j h(x) o Pos(-) = sup [ a A Pos( ]. where A a = ( x I h(x) > a }. 

* ae [0.11 


When X = { Xj I i = 1 n } is finite, if we reorder X such that h(Xj) > h^) > ... £ hfx^, 

then the possibility integral can be written as 

□ 

j h(x)oPos(-) = V[ h(xj) APosfAj) 1, where Aj ={Xj,X 2 xj). 



The rationale of the possibility expectation is to find the source within the universe where 
both the information value h(xj) and the possibility measure Pos(Aj) are compatibly large, that 
is, where the feasibility of the data and the reliability of a subset of sources is jointly optimal. 

The fuzzy integral developed by Sugeno [1] has the same formulation with the exception 
that a fuzzy measure is used in lieu of the possibility measure. One of the advantages of the 
possibility integral is that the measures Pos(Aj 1 are easily calculated from the densities by the 
recursive relationship 

Pos(Aj) = Pos({xj}) = p 1 ; 

Pos(Aj ) = PostAj.j U{Xj )) = Pos(Aj_j) vpl . 

In contrast, for Sugeno fuzzy measure with the fuzzy densities { g 1 , .... g n |, this 
recursive definition becomes 

A i ) = gxK x i)) *g 1; ... 

g^(Aj ) = g^(Aj.j U{xj }) = g x ( Aj.i) + gU X gi g^( Aj.i), 
where X > -1 [1, 10, 11], The value of X must be calculated from the equation 
11 ( 1 +Xgi)= 1+X. [11. 

i«i 

If one is going to try to learn a measure (iteratively) from training data, the amount of 
computations necessary to learn a possibility measure, and then evaluate its possibility 
integral is considerably less than that required for a Sugeno fuzzy measure and its fuzzy 
Integral. 

For a multiclass pattern recognition problem (or any multicriteria decision making 
problem), the set X represents sources of information (criteria). Each class (alternative) will 
have its own evidence function h,: X -»[0, 11 to assess the feasibility that the decision is class i 
(alternative i) from the standpoint of each individual source, Xj . Also, each class will have its 
own possibility measure Pos ( which determines the worth of all subsets of sources in deciding 
that a particular object belongs to class i. Finally, the collection of possibility integrals 
ej = { h,(x) oPoSj(-). 

gives a class-individualized "fusion" of the direct evidence with the worth of that evidence. A 
final crisp decision can be made from the possibility expectations (integral values), for 
example, pick the class corresponding to the maximum possibility expectation. Alternately, 
these expectation values can used as confidences for later processing. 

3. Properties of The Possibility Integral 

Several interesting properties of the possibility integral are proved in (11). Of particular 


interest to the algorithm presented in the next section are the following two results. 

Theorem 3.1 0 < i h(x) o Pos(-) < Pos(X). 

J x 

Theorem 3.2 If hj(x) < h 2 (x) Vx: 

| hj(x)oPos(j < ^ h 2 (x) o Pos('). ifPos(X) > hi(x) for allx, 

^ hjMoPosl-) = h 2 (x) o PosW, if Pos(X) < h^x) for all x. 

4. Decision Rule and Training Algorithm 

In the procedure given below, we consider a two class pattern recognition problem, or a two 
alternative decision process. The approach can be extended directly to multiple classes, but 
from the particular structure of the training mechanism, it would be more appropriate to view 
it as a series of two class problems, either as pairwise distinctions, or as each class against all 
of the remaining classes. Since the possibility integral algorithm dose not create geometric 
decision boundaries in feature spaces (as, for example, Bayes Decision Theory), the second 
approach is reasonable and contains fewer subdecisions which need to be made to extend this 
to multiple classes. 

The actual decision algorithm utilizes the nature of the possibility integral to split the 
input objects (as represented by the evidence function h(x) ) into four groups to reduce the 
computational load. The first two groups deal with the case where the strength of all objective 
evidence for one class outweighs that for the other. In most cases, this corresponds to the fact 
that, in a pattern recognition problem, a majority of the data are easily distinguished (being 
quite typical of their class). Decision rules 1 and 2 below are a consequent of Theorem 3.2 
assuming that the possibility measures for both classes in this case are identical. Of course, 
there are problems where the objective evidence for one class can dominate that for the other 
class, and yet, the object belongs to the later. This could happen if the worth of the source, i.e., 
the densities, are vastly different between classes. During training, this condition is 
monitored, and if the training data produce such outcomes, the first two rules are abandoned, 
forcing all training samples to be "conflict data”. 

The initial definition of "conflict" is an object where the evidence function for one class 
does not dominate that of the other. In this case, we split the training data (and also the 
unknown test objects) into two subgroups based on the class receiving the highest degree of 
support from any source. For each group, two possibility measures are formed which minimize 
the total misclassification of the training data. The purpose of partitioning the data in this 


manner is to reduce the size of the training set since our initial training scheme is 
a complete search through a quantized set of all pairs of density functions. To reduce further 
the amount of computations, we note that the value of a possibility integral cannot be larger 
than the maximum of the function being integrated. This fact allows us to restrict the range of 
density values to be no larger than the maximum evidential support in the training set. 
(Reducing the training sets gives more opportunity to invoke this restriction). Optimal pairs of 
density functions (in term of minimal error rate on the training data) are formed and then used 
in the testing cycle. There are 4 possibility measures generated during training - one from each 
class in each of the two subgroups of conflict data. 

The decision algorithm is summarized below. 

BEGIN 

FOR each feature data vector DO obtain hj(xj ) for all J and h 2 (xj ) for all j; 

( 1 ) IF h j (xj ) > h 2 (xj) for all j, THEN the feature data vector belongs to class 1 . 

(2) ELSE IF hjfxj) < ^(xj) for all j, THEN the feature data vector belongs to class 2. 

(3) ELSE 

If Vh 2 (xj) > Vh 2 (xj ), Then 

e 1= Vfh^xj) A Pos n (Aj)]. V[h 2 (Xj) A Pos 12 (Aj) 1 
Else 

e 1= VJh^Xj) A Pos^f A j)], 62 = V[h 2 (xj) a Pos^fAjJl 

End if 

If e | > e 2 , Then the feature data vector belongs to class 1 , 

Else the the feature data vector belongs to class 2. 

End If 
END IF 
END FOR 
END. 

5. Experimental Results 

Two shuttle-earth-space intensity images were used in the experiment, in which all the 
data from the two images were treated as "conflict data' and hence only the third decision rule 
applies. 

The training image is shown in Fig. 5.1 and the test image is shown in Fig. 5.5. Three 
texture feature images (contrast, difference, and the entropy) were derived from the training 


and the test images respectively, i.e., three feature images for training and three feature images 
for testing (For the definition of these features, please see section on membership generation 
techniques in this report). The three feature images, used for training the possibility densities, 
are shown in Fig. 5.2. The three feature images used in testing are shown in Fig. 5.6. 

The possibility distribution (or membership function) of each class in each feature, that 
used to generate the evidential function h(x), is determined by using the possibilistic clustering 
algorithm on the histograms of each class in each feature, which is described in another 
section of this report. 

While training, the possibility densities were determined with the “perceptron criterion" 
(i.e., minimize the decision error) from the feature images in Fig. 5.2. The segmentation result 
corresponding to the possibility measure(s) for the training image is shown in Fig. 5.3, in 
which the shuttle and its background are clearly segmented, except that the shuttle body seems 
disconnected. To improve the connection of the shuttle body, the possibility densities of the 
shuttle were raised slightly, from which the segmentation result in Fig. 5.4 and the result in 
Fig. 5.7 (for the test case) were obtained. These results can be improved quite easily with a 
shrink-exp and operation. 

6. Conclusion 

In this paper, a decision making algorithm based on a variation of the fuz 2 y integral was 
proposed. The possibility integral has a particularly simple generation capability. The 
algorithm was run on the shuttle-earch-space images, reasonable good results were obtained. 
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Fig 5.2 (lop left) Intensity training image. 

(top right) Contrast feature image, 
(bottom left) Difference feature image, 
(bottom right) Entropy feature image. 






possibility integral algorithm. 



Fig 5.4 Segmented image2 using the possibility integral algorithm. 





Fig 5.6 (top left) Intensity testing image. 

(top right) Contrast feature image, 
(bottom left) Difference feature image, 
(boaom right) Entropy feature image. 
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Fig 5.7 Segmented testing image using the possibility integral algorithm. 



Our work in this area has progressed nicely. We have designed and implemented a 
new algorithm to generate membership values from a set of training data using a multi-layer 
neural network. This is in addition to the progress we made in the transformation of 
"probability density functions" into possibility distributions for use in assigning 
membership values to individual points as reported in the third quarter report 
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There has been intensive research in neural network applications to pattern 
recognition problems. Particularly, the back-propagation network has attracted many 
researchers because of its outstanding performance in pattern recognition applications. In 
this section, we describe a new method to generate membership functions from training 
data using a multilayer neural network. The basic idea behind the approach is as follows. 

The output values of a sigmoid activation function of a neuron bear remarkable resemblance 
to membership values. Therefore, we can regard the sigmoid activation values as the 
membership values in fuzzy set theory. Thus, in order to generate class membership 
values, we first train a suitable multilayer network using a training algorithm such as the 
back-propagation algorithm. After the training procedure converges, the resulting network 
can be treated as a membership generation network, where the inputs are feature values and 
the outputs are membership values in the different classes. 

This method allows fairly complex membership functions to be generated because 
the network is highly nonlinear in general. Also, it is to be noted that the membership 
functions are generated from a classification point of view. For pattern recognition 
applications, this is highly desirable, although the membership values may not be indicative 
of the degree of typicality of a feature value in a particular class. 


A. Typical Example 

In this section we show an example of a membership network that can generate 
membership values for "shuttle" and "background". The network we used had one input 
unit, eight hidden units and two output units. Input data to the network were feature values 
and the observed activation values of the outputs after the network was trained with the 
back-propagation algorithm were considered as the degree of belonging to the particular 
classes. In this experiment, there were only two classes: object (shuttle) and background 
(space and earth). The training image is shown in Fig 1. 



We generated membership functions corresponding to four texture features. These 
four feature images are shown in Fig 2. These features were contrast, difference, entropy, 
difference entropy, and homogeneity. They are defined by 


Contrast = 


•£>{ VW 

/i = 0 i i = l j = l J 


au & - 

Entropy = - . L Z p(ij) log (p(ij)) 
i = i ; = i 

Difference Entropy = - 2^ p x . y {k) log (p x -y{k)) 


Homogeneity = 


- N tl > 


i = 1 y= 1 l+(l -;)2 


pOV) 


where p(ij) is the (ij)-ih entry in the spatial gray level dependence matrix, and N g is the 
number of gray levels. Also, p x -y(k) is defined by 
& & 

p x .y(k) = - X z- p(ij) such that /I • / I = k 
i = 1 j = 1 

(See [1,2] for details.) 


All feature values were normalized to lie between 0 and 255. The training sets were 
formed by manually picking samples from the object and background regions of all four 
texture feature images. There were 100 samples for each class. After the network was 
trained, we fed gray values (0-255) to the input unit and collected the activation values of 
output units to generate the membership functions. Fig 3.1 and Fig 3.2 show the 
histograms of the features for the background and the object, and the corresponding 
membership values for all four features. 


B. Discussion 


Fig 3.1 (c) shows the membership functions of object and background for contrast 
feature. The membership functions are very steep because only one or two gray level 
values overlap between the histograms of the background and the object. One the other 
hand. Fig 3.2 shows broader membership functions because of a broader overlapping area 
between the histograms for the entropy and homogeneity features. An interesting 
observation is that when histograms of object and background overlap, the network sets the 



crossover point at the middle of the overlapping area. This reveals the nice membership 
generation capability of the neural network. 

C. Conclusion 

This heuristic method of generating membership function has some merits 
compared to the probability-possibility transformation method described in our third 
quarterly report. The transformation method requires a precise estimation of a probability 
density function. In practice, this is difficult to achieve when the number of training 
samples is small. Also the resulting shape of the membership function is almost the same as 
the probability density function. In order words, membership functions generated by these 
methods seem to have a frequency interpretation of the data. Fig 4 and Fig 5 show 
examples of the transformation based membership functions obtained with 1,000 samples 
per feature per class. Even with this high number, the functions are rather noisy. 

One short coming of this heuristic method is that the memberships do not represent 
"typicality". However, if the memberships are to be used subsequently in a pattern 
recognition algorithm then this method will provide better classification results. 
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Fig 2. text features : in clockwise, contrast, difference 

entropy, entropy, and homogeneity. 
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Fig 3. 1 Histogram of 
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Fig 3.2 Histogram of background and object, and 
corresponding membership function. 













Fig 4.1 membership and p.d.f by Dubois and Prade : 

small graphe is p.d.f and big one is membership. 
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Fig 4.2 membership and p.d.f by Dubois and Prade : 

small graphe is p.d.f and big one is membership. 
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Fig 5.1 membership and p.d.f by Klir : 

small graphe is p.d.f and big one is membership 
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Fig 5.2 membership and p.d.f by Klir : 

small graphe is p.d.f and big one is membership 







Clustering Methods 

At the Third International Workshop on Neural Networks and Fuzzy Logic, we 
presented our new approach of possibilistic clustering applied to the recognition of Plano - 
Quadric clusters. In what follows, we present the paper which will appear in the 
proceedings of that Workshop, followed by other examples of the results of the algorithms. 
Several examples are of images of the shuttle. 
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Abstract 

Clustering methods have been used extensively in computer vision and pattern 
recognition. Fuzzy clustering has been shown to be advantageous over crisp (or traditional) 
clustering in that total commitment of a vector to a given class is not required at each 
iteration. Recently fuzzy clustering methods have shown spectacular ability to detect not 
only hypervolume clusters, but also clusters which are actually "thin shells", i.e., curves 
and surfaces. Most analytic fuzzy clustering approaches are derived from Bezdek's Fuzzy 
C-Means (FCM) algorithm. The FCM uses the probabilistic constraint that the 
memberships of a data point across classes sum to one. This constraint was used to 
generate the membership update equations for an iterative algorithm. Unfortunately, the 
memberships resulting from FCM and its derivatives do not correspond to the intuitive 
concept of degree of belonging, and moreover, the algorithms have considerable trouble in 
noisy environments. Recently, we cast the clustering problem into the framework of 
possibility theory. Our approach was radically different from the existing clustering 
methods in that the resulting partition of the data can be interpreted as a possibilistic 
partition, and the membership values may be interpreted as degrees of possibility of the 
points belonging to the classes. We constructed an appropriate objective function whose 


minimum will characterize a good possibilistic partition of the data, and we derived the 
membership and prototype update equations from necessary conditions for minimization of 
our criterion function. In this paper, we show the ability of this approach to detect linear 
and quartic curves in the presence of considerable noise. 
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I. Introduction 


Clustering has long been a popular approach to unsupervised pattern recognition. It 
has become more attractive with the connection to neural networks, and with the increased 
attention to fuzzy clustering. In fact, recent advances in fuzzy clustering have shown 
spectacular ability to detect not only h>pervolume clusters, but also clusters which are 
actually "thin shells", i.e., curves and surfaces [1-7]. One of the major factors that 
influences the determination of appropriate groups of points is the "distance measure" 
chosen for the problem at hand. Fuzzy clustering has been shown to be advantageous over 
crisp (or traditional) clustering in that total commitment of a vector to a given class is not 
required at each iteration. 

Boundary detection and surface approximation are important components of 
intermediate-level vision. They are the first step in solving problems such as object 
recognition and orientation estimation. Recently, it has been shown that these problems can 
be viewed as clustering problems with appropriate distance measures and prototypes [1-7]. 
Dave's Fuzzy C Shells (FCS) algorithm [2] and the Fuzzy Adaptive C-Shells (FACS) 
algorithm [7] have proven to be successful in detecting clusters that can be described by 
circular arcs, or more generally by elliptical shapes. Unfortunately, these algorithms are 
computationally rather intensive since they involve the solution of coupled nonlinear 
equations for the shell (prototype) parameters. These algorithms also assume that the 
number of clusters are known. To overcome these drawbacks we recently proposed a 
computationally simpler Fuzzy C Spherical Shells (FCSS) algorithm [6] for clustering 
hyperspherical shells and suggested an efficient algorithm to determine the number of 
clusters when this is not known. We also proposed the Fuzzy C Quadric Shells (FCQS) 
algorithm [5] which can detect more general quadric shapes. One problem with the FCQS 
algorithm is that it uses the algebraic distance, which is highly nonlinear. This results in 
unsatisfactory performance when the data is not very "clean" [7]. Finally, none of the 


algorithms cm handle situations in which the clusters include lines/planes and there is much 
noise. In [S], we addressed those issues in a new approach called Piano-Quadric 
Clustering. Li this paper, we show how that algorithm, coupled with our new possibilistic 
clustering, can accurately find linear and quadric curves in the presence of noise. 

Mosi analytic fuzzy clustering approaches are derived from Bezdek's Fuzzy C- 
Means (FCM) algorithm [9]. The FCM uses the probabilistic constraint that the 
memberships of a data point across classes must sum to one. This constraint came from 
generalizing a crisp C-Partition of a data set, and was used to generate the membership 
update equations for an iterative algorithm. These equations emerge as necessary conditions 
for a global minimum of a least-squares type of criterion function. Unfortunately, the 
resulting memberships do not represent one's intuitive notion of degrees of belonging, i. 
e„ they do not represent degrees of "typicality" or "possibility". 

There is another important motivation for using possibilistic memberships. Like all 
unsupervised techniques, clustering (crisp or fuzzy) suffers from the presence of noise in 
the data. Since most distance functions are geometric in nature, noise points, which are 
often quite distant from the primary clusters, can drastically influence the estimates of the 
class prototypes, and hence, the final clustering. Fuzzy methods ameliorate this problem 
when the number of classes is greater than one, since the noise points tend to have 
somewhat smaller membership values in all the classes. However, this difficulty still 
remains in the fuzzy case, since the memberships of unrepresentative (or noise) points can 
still be significantly high. In fact, if there is only one real cluster present in the data, there is 
essentially no difference between the crisp and fuzzy methods. 

On the other hand, if a set of feature vectors is thought of as the domain of 
discourse for a collection of independent fuzzy subsets, then there should be no constraint 
on the sum of the memberships. The only real constraint is that the assignments do really 
represent fuzzy membership values, i.e., they must lie in the interval [0,1]. In [10], we cast 



the clustering problem into the framework of possibility theory. We briefly review this 
approach, and show it's superiority to recognize shapes from noisy and incomplete data. 

II. Possibilistic Clustering Algorithms 

The original FCM formulation minimizes the objective function given by 
C N C 

J C L,u ) = . E . E (Ji ij y n d]j , subject to . E pL tJ = 1 for all 7. ( 1 ) 


In (1), L = (Aj,...,A c ) is a C-tuple of prototypes, dfj is the distance of feature point x } to 

cluster A f , N is the total number of feature vectors, C is the number of classes, and U = 

[ fj..j ] is a C xN matrix called the fuzzy C-partition matrix [ 9 ] satisfying the following 

conditions: 

fj .. € [0,1] for all / and j, ^ /i • . =1 for all y, and 

V 1=1 V 

// 

0 < H n - < N for all /. 

Here, is the grade of membership of the feature point Xj in cluster A t , and m e [1,*») 
is a weighting exponent called the fuzzifier. In what follows, A ; - will also be used to denote 
the ith cluster, since it contains all of the parameters that define the prototype of the cluster. 

Simply relaxing the constraint in ( 1 ) produces the trivial solution, i. e., the criterion 
function is minimized by assigning all memberships to zero. Clearly, one would like the 
memberships for representative feature points to be as high as possible, while 
unrepresentative points should have low membership in all clusters. This is an approach 
consistent with possibility theory [ 11 ]. The objective function which satisfies our 
requirements may be formulated as: 


(2) 


J m = J.J, V"4 + ,1,”. J, < i -v"- 

where 77 / are suitable positive numbers. The first term demands that the distances from the 

feature vectors to the prototypes be as low as possible, whereas the second term forces the 
[ijj to be as large as possible, thus avoiding the trivial solution. The following theorem, 

proved in [9], gives necessary conditions for minimization, hence, providing the basis for 
an iterative algorithm. 

Theorem: 

Suppose that X = {xj, •••> *v) ’ s a 561 ot ^ eature vectors, L = (Aj,...,A c ) is a 
C-tuple of prototypes, ctj is the distance of feature point Xj to the cluster prototype A ( , (t 
= 1, ..., C; j = 1, ..., N), and U = is a C xN matrix of possibilistic membership 

values. Then U may be a global minimum for J m (L,U) only if mj = 

The necessary conditions on the prototypes are identical to the corresponding conditions in 
the FCM and its derivatives. 

Thus, in each iteration, the updated value of /i,y depends only on the distance of x } . 
from A/, which is an intuitively pleasing result. The membership of a point in a cluster 
should be determined solely by how far it is from the prototype of the class, and should not 
be coupled to its location with respect to other classes. The updating of the prototypes 
depends on the distance measure chosen, and will proceed exactly the same way as in the 
case of the FCM algorithm and its derivatives. 

The value of determines the distance at which the membership value of a point in 
a cluster becomes 0.5 (i. e., "the 3 dB point”). Thus, it needs to be chosen depending on 
the desired "bandwidth" of the possibility (membership) distribution for each cluster. This 




value could be the same for all clusters, if all clusters are expected to be similar. In general, 
it is desirable that 77, relates to the overall size and shape of cluster A,. Also, it is to be 
noted that 77, determines the relative degree to which the second term in the objective 

function is important compared to the first. If the two terms are to be weighted roughly 
equally, then 77, should be of the order of d- . In practice we find that the following 

definition works best. 


N 



This choice makes 77, the average fuzzy intra-cluster distance of cluster A,. The value of 77,- 
can be fixed for all iterations, or it may be varied in each iteration. When 77 , is varied in 
each iteration, care must be exercised, since it may lead to instabilities. Our experience 
shows that the final clustering is quite insensitive to large (an order of magnitude) 
variations in the values of 77,. 

III. The Possibilistic C Piano-Quadric Shells Algorithm 

Suppose that we are given a second degree curve Aj characterized by a prototype 

vector 

T 

Pj = [pil*Pi 2 * • • • » Pir\ 

to which it is desired to fit points x- obtained through the application of some edge 
detection algorithm. If a point x has coordinates [xj, ... , x n ], then let 

q = [.xj, x 2 x n , x\x 2 , ■ • •**■(/!- l)*/i»-*l. *2 » • • ' x rv 1] T . 

When the exact (geometric) distance has no closed-form solution, one of the methods 
suggested in the literature is to use what is known as the "approximate distance" which is 


the first-order approximation of the exact distance. It is easy to show [12] that the 
approximate distance of a point from a curve is given by 




IV 4/ 2 


4 


(4) 


where Vd q^- is the gradient of the distance functional 


Pi T q = [pi\>Pi2> ■ • • *PiA[x\, *2’ • • •• x n' *1*2’ - ■ •’*(/t-l)*/t’*l • *2> • • •» x n> !] T (5) 


2 2 


evaluated at xj . In (4) the matrix Dj is simply the Jacobian of q evaluated at xj . 

One can easily reformulate the quadric shell clustering algorithm with <^-j as the 

underlying distance measure. It was shown in [8] that the solution to the parameter 
estimation problem is given by the generalized eigenvector problem 

Fipi = // GiPi , ( 6 ) 


where 


N 

Fi = I x <4iip m Mj , 



Gi = . Dj DjT , 


which can be converted to the standard eigenvector problem if the matrix Gi is not rank- 
deficient. Unfortunately this is not the case. In fact, the last row of Dj is always [0, . . . 
,0]. Equation (6) can still be solved using other techniques that use the modified Cholesky 
decomposition [13], and the solution is computationally quite inexpensive when the feature 
space is 2-D or 3-D. Another advantage of this constraint is that it can also fit lines and 


planes in addition to quadrics. Our experimental results show that the resulting algorithm, 
which we call the Possibilistic C Piano-Quadric Shells (PCPQS) algorithm, is quite robust 
in the presence of poorly defined boundaries (i. e., when the edge points are somewhat 
scattered around the ideal boundary curve in the 2-D case and when the range values are not 
very accurate in the 3-D case). It is also very immune to impulse noise and outliers. Of 
course, if the type of curves required are restricted to a single type, e.g., lines, or circles, 
or ellipses, simpler algorithms can be used with possibilistic updates, as will be seen. 

IV. Determination of Number of Clusters 

The number of clusters C is not known a priori in some pattern recognition 
applications and most computer vision applications. When the number of clusters is 
unknown, one method to determine this number is to perform clustering for a range of C 
values, and pick the C value for which a suitable validity measure is minimized (or 
maximized) [14]. However this method is rather tedious, especially when the number of 
clusters is large. Also, in our experiments, we found that the C value obtained this way 
may not be optimum. This is because when C is large, the clustering algorithm sometimes 
converges to a local minimum of the objective function, and this may result in a bad value 
for the validity of the clustering, even though the value of C is correct. Moreover, when C 
is greater than the optimum number, the algorithm may split a single shell cluster into more 
than one cluster, and yet achieve a good value for the overall validity. To overcome these 
problems, we proposed in [8] an alternative Unsupervised C Shell Clustering algorithm 
which is computationally more efficient, since it does not perform the clustering for an 
entire range of C values. 

Our proposed method progressively clusters the data starting with an overspecified 
number C m ax of clusters. Initially, the FCPQS algorithm is run with C=C ma x ■ After the 
algorithm converges, spurious clusters (with low validity) are eliminated; compatible 


clusters are merged; and points assigned to clusters with good validity are temporarily 
removed from the data set to reduce computations. The FCPQS algorithm is invoked again 
with the remaining feature points. The above procedure is repeated until no more 
elimination, merging, or removing occurs, or until C- 1. 

V. Examples of Possibilistic Clustering for Shape Recognition 

Figures 1 and 2 show the detection of a circular "fractal edge" from a 
synthetically generated image. Figure 1(a) is the original composite fractal image; figure 
1(b) shows what a gray-scale edge operator finds (or doesn't find); figure 1(c) is the output 
of the horizontal fractal edge operator; with figure 1(d) giving the maximum overall 
response of the fractal operators in four directions. Figure 2(a) depicts the (noisy) 
thresholded and thinned result from figure 1(d). Figure 2(b) gives the final prototype found 
by the FPQCS (which, since there is only one cluster present, is the same as the crisp 
version). Note how the presence of noise distorts the final prototype. Figure 2(c) shows 
the possibilistic algorithm output, which is superimposed on the original image in figure 
2(d). The results of the PPQCS algorithm are virtually unaffected by noise. Several 
examples comparing crisp, fuzzy and possibilistic versions of clustering can be found in 
[ 6 , 8 , 10 ]. 

Figure 3 depicts the algorithm applied to the image of a model of the Space Shuttle. 
Figure 3(a) is the original image. Figure 3(b) gives the output of a typical edge operator. 
Note that, due to the rather poor quality of the original image, the edges found both noisy 
and incomplete. This data was then input into the possibilistic piano-quadric clustering 
algorithm. Figure 3(c) gives the eight complete prototypes which were found after running 
the algorithm. Finally, figure 39(d) displays the prototype drawn only where sufficient 
edges points exist. 



VI. Conclusions 



In this paper, we demonstrated how our new possibilistic approach to objective- 
function-based clustering coupled with our piano - quadric shells algorithm can recognize 
first and second degree shapes from incomplete and noisy edge data. This approach is 
superior to both crisp and fuzzy clustering, as well as to traditional methods such as the 
Hough Transform. Extensions of this approach to other classes of shapes is currently 
underway. 
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Figure 1. Detection of a fractal circular edge. 

(a) Upper Left. Original fractal composite image. 

(b) Upper Right. Output of gray scale edge operator. 

(c) Lower Left. Output of "horizontal" fractal edge operator. 

(d) Lower Right. Results of Maximum magnitude of outputs of lour directions of fractal operators. 



Figure 2. Recognition of circular boundary-. 

(a) Upper Left. Figure 1 (d) thresholded and thinned. 

(b) Upper Right. Circular prototype found by fuzzy (or crisp) clustering. 

(c) Lower Left. Circular prototype found by possibilistic clustering. 

(d) Lower Right. Possibilistic prototype superimposed on original image. 




Figure 3. Recognition of Shuttle model boundaries. 

(a) Upper Left. Original Shuttle image. 

(b) Upper Right. Incomplete and noisy edges found by edge operator. 

(c) Lower Left. Prototypes found by Possibilistic Piano-Quadric clustering. 

(d) Lower Right. Possibilistic prototypes superimposed drawn where there is sufficient edge 

information. 






Pose Estimation Using Possibilistic Clustering 


In the Third Quarter report, we described how the Unsupervised C Quadric Shells 
(UCQS) algorithm could be used to estimate the pose of the shuttle. The shuttle's image is 
taken from the back so that the exhaust nozzles and the back edges of the three wings are 
apparent. Given an original unrotated image, the exhaust nozzles can be parametrized by 
three circles, and the three wings can be parametrized by three straight lines. These 
parameters are easily determined by the UCQS algorithm. As the shutde rotates, the shape 
of the nozzles will change from circles to ellipses, so will the orientation of the straight 
lines representing the three wings. The UCQS algorithm is used in order to cluster this 
edge image and determine the parameters of the ellipses and lines. Finally, these parameters 
can be used to solve for the translation and rotation parameters, as long as the translation is 
made in the image plane. In fact, depth information can also be derived from the change in 
the size of the nozzles. 

We also consider the case where only line information is available. Once again, our 
new possibilistic piano-quadric clustering approach is used to detect and recognize the 
linear segments. In what follows, derivation of pose parameters is given for both the case 
where three corresponding line segments have been identified, and where one circle and 
one line have been matched. 


POSE ESTIMATION: 


The 3-D object attitude in space can be determined from a single perspective image. 
Dhome et al [1] developed a method to solve for the three-dimensional attitude of an object 
based on the perspective projection of three image lines. Krishnapuram & Casasent [2] 
developed a method for determining two of the three rotation angles necessary to describe 
an object attitude in 3-D space from a single perspective projection of one circle. 

I. Determination of The Attitude of One Object From Three Lines: 

The perspective projection of a point Pi = (Xi, Yj, Zi) on an image is the point pi 
= (xi, yi, zj) = (Xi f/Z, Yj f/Z, f). Let li be an image line characterized by a vector vi = 
(ai, bj, 0) and a point pi = (xi, yi, 0- li is the perspective projection of a space line Li. 
Therefore it lies in the "interpretation plane" containing the origin of the coordinate system 
O and the image line lj. The normal Nj to this plane is perpendicular to vi and the vector 
Opj. Thus Ni = vi ¥ Opi = (bi f, -aj f, di) T , where di = ai yi - bj xi is the Euclidean 
distance between the center of the image and line K. If Vi = (Ai, Bi, Q) T is the director 
vector of the space line Li, then it must be orthogonal to Ni, hence Vi . Ni = 0 implying 
that : 

(Ai, Bi, Ci) T . (bi, -ai, di/f) = 0 (1) 

Consider three object lines in 3-D space Loi, i = 1, .... 3 defined in a model reference frame 
(Som)- The director vector of Loi is Voi = (Aoi, Boi, Q)i)T. When the object is rotated in 3- 
D space, the lines Loi are rotated into lines L 3 i. Therefore 

(A3i, B3i, C3i) T = Rctpy ( A 0i> Boi, C0i) T (2) 

where Ra(3y is the rotation matrix. 

The perspective projections of lines L3j are the lines lQi- Equation (1) becomes 


( 3 ) 


(A3i, B 3 i, C3i)^. (bOi, -aOi. dOj/f)T = 

Rapy (Aoi. Boi, Coi) T . (bQi, -aoi. dQi/0 T = 0 


where i = 1,..., 3 and a, p, and y are the unknown rotation angles about x, y, and z axes 
respectively. Solving this system of equations is too complicated. A specially defined 
model coordinate system (Sim) and a corresponding viewer coordinate system (Si v ) can 
be used to simplify the problem [1], With these coordinate systems, only two rotation 
angles a and P need to be determined, i.e. the system of equations (3) can be reduced to 
two equations and two unknowns. First, a is found by iteratively solving an 8 1 * 1 order 
equation. Then P is solved for by substitution. When the three lines are coplanar, or when 
they form a junction, the 8^ order equation reduces to a 4^ order equation. 

II Determination of the Attitude of an Object From a Circle and a Line: 


Given a circular curve on the x-y plane, and an x’ y’ view of this curve in a 
different coordinate system x’ y’ z'. The two frames (x, y, z) and (x’. y', z') are related by 
a homogenous transformation T, such that 
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A circle of radius r on the xy plane is described by : 
r ? 2 2 

lx“ + y = r 
lz = 0 


In the (x\ y'. z') frame, equations (4) & (5) become 


(4) 

(5) 



(tl lx’ + ti2y'+tl3z') 2 + (t2 1 X' + I22y'+t23z') 2 = r 2 


( 6 ) 

(7) 


t3 1 x ' + t32y'+t33z’ = 0 


Substituting z' in terms of x’ and y' from equation (7) into equation (6) yields the equation 
for the 2-D projection of the 3-D circular curve onto an arbitrary x' y' plane. Making use of 
the fact that the columns of T are mutually orthogonal unit vectors, we obtain 
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( 8 ) 


This is the equation of an ellipse in the (x\ y', z') frame. If the parameters of this ellipse 
are known, equation (8) can be solved for the transformation parameters t3i, t32, and t33. 
The transformation matrix T can be written as a function of the rotation angles a, p, and y: 


cosy cosP 

siny cosP 

-sinP 

0 


cosy sinP sina - siny cosa cosy sinP cosa + siny sina 0 
siny sinP sina + cosy cosa siny sinP cosa - cosy sina 0 

cosP sina cosP cosa 0 

0 0 1 


Having already solved for t3i, t32, and t33 , a and b can be easily determined from the 3 rc * 
row of T. 

In order to determine the 3 rc * angle y, a line can be used in addition to the circle. In this case 
the two rotation angles a and P can be determined as discussed previously. Knowing these 
two angles, equation (3) with i = 1 (since we have only one line) becomes simple to solve, 
since the only unknown is y. 
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Testing the Algorithms 


Besides the examples shown in the earlier reports, and the accompanying papers, 
we conclude this report with several examples of the results of our research. 

Examples of Determination of Lines 
from Different Orientations of the Shuttle 
by Possibilistic Clustering 

The following two pages show the use of the Possibilistic Plano - Quadric 
Clustering Algorithm to identify the lines on images of the Shuttle. The gray scale images 
were synthetically generated by Lincom. A simple edge detector was run on the images. 
Thresholded output of the edge data was then sent to the unsupervised clustering algorithm 
( which also determines the optimum number of clusters ). The prototypes which were 
identified are then displayed. After matching is performed, the approach described above 
could be used to determine the rotation angles to specify the pose of the second image 
relative to the reference model (first image). A complete solution to this problem is being 
proposed for a second year effort. 














Further examples of recognition of linear and quadric curves 
The following images were shown at the NASA Workshop, although they were not 
included in the paper which is to appear in the proceedings. In each case, the original image 
was processed by an appropriate edge operator, the results were thresholded, and the 
resulting edge points were used as input to the possibilistic clustering algorithm. 






































